In this document, I run a simulation exercise to illustrate issues arising when estimating Two-Way Fixed Effect (TWFE) models with staggered and heterogeneous treatment.
Summary: if one only wants to use the data generated here and play with it without reading my whole spiel, they can use the function generate_data_TWFE. It takes many parameters but they are all pretty intuitive and described in the section “Modelisation choices” of this document. The other useful function is compute_simulation_TWFE. Taking similar parameters as generate_data_TWFE it generates the data and runs an estimation.
First, I load useful packages. Note that some packages are optional. If you do not want to install/use them, you may need to modify part of your code. mediocrethemes is my ggplot theme package. If you want to use it, you can find instructions here.
To illustrate issues arising with TWFE, I build simple fake data, estimate a TWFE model on it and replicate the analysis several times.
To simplify, I consider the assumptions described below. Of course these assumptions are purely arbitrary and I invite you to play with them. Note that, fixed effects and the covariate are not necessary to the analysis. I only add them to make the analysis more realistic if necessary but I set their baseline values to 0.
More precisely, I set:
\(N_i\) the number of individual
\(N_t\) the number of periods
\(\lambda_i \sim \mathcal{N}(\mu_{IFE}, \sigma_{IFE}^{2})\) the fixed effect for individual \(i\)
\(\eta_t \sim \mathcal{N}(\mu_{TFE}, \sigma_{TFE}^{2})\) the fixed effect for time period \(t\)
\(x_{it} \sim \mathcal{N}(\mu_{x}, \sigma_{x}^{2})\)
\(e_{it} \sim \mathcal{N}(0, \sigma_{e}^{2})\) some noise
\(T_{it}\) represent the treatment allocation, it is equal to one if individual \(i\) is treated at time \(t\) and 0 otherwise,
\(y_{it} = \alpha + \beta_{it} T_{it} + \gamma x_{it} + \lambda_i + \eta_t + e_{it}\) where \(\alpha\) and \(\gamma\) are some constants.
\(\beta_{it}\) is represents the magnitude of the treatment effect and is linked to the input parameter beta.
Across individuals, the treatment can either be:
het_indiv == homogeneous, for each individual, the treatment is equal to beta,het_indiv == random, for each individual, the treatment is drawn from \(\mathcal{U}(0.5\beta, 1.5\beta)\),het_indiv == large_first, for each individual, the treatment is equal to \(N_t - \beta\).Across time, the effect of the treatment can either be
het_time == constant,het_time == linear.I also create a bunch of variables that can be useful:
I write a simple function that generates the data. It takes as input the values of the different parameters and returns a data frame containing all the variables for this analysis.
generate_data_TWFE <- function(N_i,
N_t,
sigma_e,
p_treat,
staggered,
het_indiv,
het_time,
alpha,
beta,
mu_indiv_fe = 0,
sigma_indiv_fe = 0,
mu_time_fe = 0,
sigma_time_fe = 0,
mu_x = 0,
sigma_x = 0,
gamma = 0
) {
if (!is.logical(staggered)) {stop("staggered must be logical")}
if (!(het_indiv %in% c("large_first", "random", "homogeneous"))) {
stop('het_indiv must be either "large_first", "random" or "homogeneous"')
}
if (!(het_time %in% c("constant", "linear"))) {
stop('het_time must be either "constant" or "linear"')
}
data <- tibble(indiv = 1:N_i) %>%
mutate(in_treatment = (indiv %in% sample(1:N_i, floor(N_i*p_treat)))) %>%
crossing(t = 1:N_t) %>%
group_by(indiv) %>%
mutate(
indiv_fe = rnorm(1, mu_indiv_fe, sigma_indiv_fe),
t_event = ifelse(staggered, sample(2:(N_t - 1), 1), floor(N_t/2)),
#I use 2:(N_t-1) to have a pre and post period
t_event = ifelse(in_treatment, t_event, NA),
beta_i = case_when(
het_indiv == "large_first" ~ N_t-t_event,
het_indiv == "random" ~ runif(1, beta*0.5, beta*1.5),
het_indiv == "homogeneous" ~ beta
),
beta_i = ifelse(is.na(t_event), 0, beta_i)
) %>%
ungroup() %>%
group_by(t) %>%
mutate(time_fe = rnorm(1, mu_time_fe, sigma_time_fe)) %>%
ungroup() %>%
mutate(
post = (t > t_event),
treated = in_treatment & post,
beta_i = ifelse(
het_time == "linear" & post & !is.na(t_event),
beta_i*(t - t_event),
beta_i
),
t_centered = t - t_event,
x = rnorm(nrow(.), mu_x, sigma_x),
e = rnorm(nrow(.), 0, sigma_e),
y0 = alpha + gamma * x + indiv_fe + time_fe + e,
y1 = y0 + beta_i,
y = treated*y1 + (1 - treated)*y0
)
return(data)
}
I set baseline values for the parameters as very standard. These values are arbitrary.
baseline_parameters_TWFE <- tibble(
N_i = 20,
N_t = 50,
sigma_e = 1,
p_treat = 0.8,
staggered = TRUE,
het_indiv = "homogeneous",
het_time = "constant",
alpha = 1,
beta = 1
)
Here is an example of data created with the data generating process and baseline parameter values, for 2 individuals and 8 time periods:
baseline_parameters_TWFE %>%
mutate(N_i = 2, N_t = 8) %>%
pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
select(indiv, t, y, in_treatment, post, treated, t_centered, e) %>%
kable()
| indiv | t | y | in_treatment | post | treated | t_centered | e |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 2.5952808 | TRUE | FALSE | FALSE | -4 | 1.5952808 |
| 1 | 2 | 1.3295078 | TRUE | FALSE | FALSE | -3 | 0.3295078 |
| 1 | 3 | 0.1795316 | TRUE | FALSE | FALSE | -2 | -0.8204684 |
| 1 | 4 | 1.4874291 | TRUE | FALSE | FALSE | -1 | 0.4874291 |
| 1 | 5 | 1.7383247 | TRUE | FALSE | FALSE | 0 | 0.7383247 |
| 1 | 6 | 2.5757814 | TRUE | TRUE | TRUE | 1 | 0.5757814 |
| 1 | 7 | 1.6946116 | TRUE | TRUE | TRUE | 2 | -0.3053884 |
| 1 | 8 | 3.5117812 | TRUE | TRUE | TRUE | 3 | 1.5117812 |
| 2 | 1 | 1.3898432 | FALSE | NA | FALSE | NA | 0.3898432 |
| 2 | 2 | 0.3787594 | FALSE | NA | FALSE | NA | -0.6212406 |
| 2 | 3 | -1.2146999 | FALSE | NA | FALSE | NA | -2.2146999 |
| 2 | 4 | 2.1249309 | FALSE | NA | FALSE | NA | 1.1249309 |
| 2 | 5 | 0.9550664 | FALSE | NA | FALSE | NA | -0.0449336 |
| 2 | 6 | 0.9838097 | FALSE | NA | FALSE | NA | -0.0161903 |
| 2 | 7 | 1.9438362 | FALSE | NA | FALSE | NA | 0.9438362 |
| 2 | 8 | 1.8212212 | FALSE | NA | FALSE | NA | 0.8212212 |
Let’s now have a look at different types of treatment and treatment allocations. First, let’s look at treatment allocation mechanisms. The allocation can either be staggered or not, the treatment homogeneous across individual or not and cconstant or not in time.
labs_graph_staggered <- labs(
title = "Treatement assignment across time and individuals",
x = "Time index",
y = "Individual id",
fill = "Treated"
)
baseline_parameters_TWFE %>%
mutate(staggered = FALSE) %>%
pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
ggplot(aes(x = t, y = factor(indiv), fill = factor(treated))) +
geom_tile(color = "white", lwd = 0.3, linetype = 1) +
coord_fixed() +
labs_graph_staggered +
labs(subtitle = "Non staggered")
baseline_parameters_TWFE %>%
mutate(staggered = TRUE) %>%
pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
ggplot(aes(x = t, y = factor(indiv), fill = factor(treated))) +
geom_tile(color = "white", lwd = 0.3, linetype = 1) +
coord_fixed() +
labs_graph_staggered +
labs(subtitle = "Staggered")
Now, let’s vary treatment effect size across individuals, considering a staggered adoption.
labs_graph_size <- labs(
title = "Treatement effect size across time and individuals",
x = "Time index",
y = "Individual id",
fill = "Treatment effect size"
)
baseline_parameters_TWFE %>%
mutate(het_indiv = "homogeneous", het_time = "constant") %>%
pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
ggplot(aes(x = t, y = factor(indiv), fill = round(treated*beta_i, 2))) +
geom_tile(color = "white", lwd = 0.3, linetype = 1) +
coord_fixed() +
labs_graph_size +
labs(subtitle = "Homogeneous treatment effect across individuals, constant in time")
baseline_parameters_TWFE %>%
mutate(het_indiv = "random", het_time = "constant") %>%
pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
ggplot(aes(x = t, y = factor(indiv), fill = round(treated*beta_i, 2))) +
geom_tile(color = "white", lwd = 0.3, linetype = 1) +
coord_fixed() +
labs_graph_size +
labs(subtitle = "Random treatment effect size across individuals, constant in time")
baseline_parameters_TWFE %>%
mutate(het_indiv = "large_first", het_time = "constant") %>%
pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
ggplot(aes(x = t, y = factor(indiv), fill = round(treated*beta_i, 2))) +
geom_tile(color = "white", lwd = 0.3, linetype = 1) +
coord_fixed() +
labs_graph_size +
labs(subtitle = "First treated have larger treatment effect, constant in time")
A last thing we can vary is that we can make individual effects increase linearly in time.
baseline_parameters_TWFE %>%
mutate(het_indiv = "homogeneous", het_time = "linear") %>%
pmap_dfr(generate_data_TWFE) %>% #use pmap to pass the set of parameters
ggplot(aes(x = t, y = factor(indiv), fill = round(treated*beta_i, 2))) + #treated*beta_i
geom_tile(color = "white", lwd = 0.3, linetype = 1) +
coord_fixed() +
labs_graph_size +
labs(subtitle = "Treatment effect increasing linearly in time")
One can now play with this function to generate their own data and run their own analyses. In the following sections, I try to run my own analyses.
First, I write a function ,estimate_TWFE. to run an estimation of a simple distributed lag model, in a simple event study type of analysis. My knowledge on the topic is limited and I built this analysis quickly. My analysis is likely to be flawed. One should thus use it with caution. Regardless, this code could hopefully be used as a template for a more informed analysis.
estimate_TWFE <- function(data) {
reg <- data %>%
mutate(
indiv = as.factor(indiv),
t = as.factor(t),
treated = as.numeric(treated),
in_treatment = as.numeric(in_treatment),
t_centered = as.factor(t_centered)
) %>%
feols(
data = .,
fml = y ~ in_treatment:t_centered | indiv + t
) %>%
broom::tidy() %>%
rename(p_value = p.value, se = std.error) %>%
mutate(term = as.numeric(str_remove_all(term, "in_treatment\\:t_centered"))) %>%
rename(lag = term) %>%
select(-statistic) %>%
suppressMessages() #Warning saying that NA values dropped and
#that one or two factors are removed due to colinearity
return(reg)
}
Here is an example output for such a simulation (limited to the 15 first lags).
baseline_parameters_TWFE %>%
pmap_dfr(generate_data_TWFE) %>%
estimate_TWFE() %>%
slice(1:15) %>%
kable()
| lag | estimate | se | p_value |
|---|---|---|---|
| -47 | 48.35164 | 72.51067 | 0.5150134 |
| -46 | 48.52208 | 71.78158 | 0.5093493 |
| -45 | 47.23598 | 70.96010 | 0.5157290 |
| -44 | 47.40445 | 70.15433 | 0.5095047 |
| -43 | 47.37527 | 69.55334 | 0.5061656 |
| -42 | 44.98009 | 68.74585 | 0.5228276 |
| -41 | 45.95534 | 67.93711 | 0.5090587 |
| -40 | 44.75955 | 67.06675 | 0.5146613 |
| -39 | 44.44709 | 66.48944 | 0.5139813 |
| -38 | 43.86001 | 65.56009 | 0.5136580 |
| -37 | 43.11749 | 65.02702 | 0.5173463 |
| -36 | 42.05929 | 63.90543 | 0.5204164 |
| -35 | 41.68371 | 63.25326 | 0.5198866 |
| -34 | 41.07402 | 62.68442 | 0.5222292 |
| -33 | 41.62932 | 61.67068 | 0.5099311 |
To run a whole simulation, I create the function compute_simulation_TWFE. This simple function takes as input the various parameters aforementioned an returns a table with the estimate of the treatment, its p-value and standard error and all input parameters.
Before writing this function, I compute the true effect (the ATT) to add it to the output of the simulation.
I can then compute the simulation.
compute_simulation_TWFE <- function(N_i,
N_t,
sigma_e,
p_treat,
staggered,
het_indiv,
het_time,
alpha,
beta,
mu_indiv_fe = 0,
sigma_indiv_fe = 0,
mu_time_fe = 0,
sigma_time_fe = 0,
mu_x = 0,
sigma_x = 0,
gamma = 0) {
data <- generate_data_TWFE(
N_i = N_i,
N_t = N_t,
sigma_e = sigma_e,
p_treat = p_treat,
staggered = staggered,
het_indiv = het_indiv,
het_time = het_time,
alpha = alpha,
beta = beta,
mu_indiv_fe = mu_indiv_fe,
sigma_indiv_fe = sigma_indiv_fe,
mu_time_fe = mu_time_fe,
sigma_time_fe = sigma_time_fe,
mu_x = mu_x,
sigma_x = sigma_x,
gamma = gamma
)
data %>%
estimate_TWFE() %>%
mutate(
N_i = N_i,
N_t = N_t,
sigma_e = sigma_e,
p_treat = p_treat,
staggered = staggered,
het_indiv = het_indiv,
het_time = het_time,
alpha = alpha,
beta = beta,
mu_indiv_fe = mu_indiv_fe,
sigma_indiv_fe = sigma_indiv_fe,
mu_time_fe = mu_time_fe,
sigma_time_fe = sigma_time_fe,
mu_x = mu_x,
sigma_x = sigma_x,
gamma = gamma
) %>%
left_join(compute_true_effect_TWFE(data), by = "lag")
}
Here is the output of one simulation:
baseline_parameters_TWFE %>%
pmap_dfr(compute_simulation_TWFE)
# A tibble: 92 x 21
lag estimate se p_value N_i N_t sigma_e p_treat staggered
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 -45 -0.229 2.09 0.914 20 50 1 0.8 TRUE
2 -44 -1.03 2.01 0.617 20 50 1 0.8 TRUE
3 -43 -0.476 1.80 0.794 20 50 1 0.8 TRUE
4 -42 -2.26 1.81 0.231 20 50 1 0.8 TRUE
5 -41 -0.593 2.02 0.774 20 50 1 0.8 TRUE
6 -40 -0.376 2.03 0.856 20 50 1 0.8 TRUE
7 -39 -0.965 1.72 0.583 20 50 1 0.8 TRUE
8 -38 -2.04 1.94 0.309 20 50 1 0.8 TRUE
9 -37 -1.26 1.90 0.519 20 50 1 0.8 TRUE
10 -36 -0.265 1.99 0.896 20 50 1 0.8 TRUE
# … with 82 more rows, and 12 more variables: het_indiv <chr>,
# het_time <chr>, alpha <dbl>, beta <dbl>, mu_indiv_fe <dbl>,
# sigma_indiv_fe <dbl>, mu_time_fe <dbl>, sigma_time_fe <dbl>,
# mu_x <dbl>, sigma_x <dbl>, gamma <dbl>, true_effect <dbl>
To analyze the results, I build a simple function to run the regression and graph the results. It takes as inputs the baseline parameters and our parameters of interest and return a graph of the lag-estimates.
graph_results <- function(baseline_parameters,
staggered = TRUE,
het_indiv = "homogeneous",
het_time = "constant") {
baseline_parameters["staggered"] <- staggered
baseline_parameters["het_indiv"] <- het_indiv
baseline_parameters["het_time"] <- het_time
graph <- baseline_parameters %>%
pmap_dfr(compute_simulation_TWFE) %>%
filter(dplyr::between(lag, -5, 5)) %>%
mutate(
estimate_level = (estimate - estimate[which(lag == 0)]),
true_effect_level = (true_effect - true_effect[which(lag == 0)]),
lag = as.integer(lag)
) %>%
ggplot(aes(x = lag, y = estimate_level)) +
geom_point() +
geom_point(aes(x = lag, y = true_effect_level), shape = 1) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
geom_pointrange(aes(
ymin = estimate_level-1.96*se,
ymax = estimate_level+1.96*se)) +
labs(
x = "Lag",
y = "Estimate (centered)",
title = "Representation of estimates for each lag",
subtitle = paste(
ifelse(staggered, "Staggered,", "Non staggered,"),
het_indiv, ",",
het_time,
"treatment"),
caption = "Hollow points represent the centered true effect
Vertical bars represent 95% confidence intervals"
)
return(graph)
}
baseline_parameters_TWFE %>%
graph_results(staggered = TRUE, het_indiv = "large_first", het_time = "linear")
baseline_parameters_TWFE %>%
graph_results(staggered = TRUE, het_indiv = "random", het_time = "linear")
baseline_parameters_TWFE %>%
graph_results(staggered = TRUE, het_indiv = "homogeneous", het_time = "linear")
baseline_parameters_TWFE %>%
graph_results(staggered = TRUE, het_indiv = "large_first", het_time = "constant")
baseline_parameters_TWFE %>%
graph_results(staggered = TRUE, het_indiv = "random", het_time = "constant")
baseline_parameters_TWFE %>%
graph_results(staggered = TRUE, het_indiv = "homogeneous", het_time = "constant")
Out of these simulations, we notice that in most cases, the two ways fixed effect model yields incorrect estimates. Further and more precise analysis should and could be ran here. Note that for some types of treatment, the variation is too limited to be able to precisely estimate the effect of interest.
I then try to compare more systematically the performance of the TWFE model in the different settings. To do so I run a Monte-Carlo simulation, creating many independent data sets and running the event study on each of them. To do so, I then compute the pre-treatment bias, ie the presence of pre-trends, the post-treatment bias and RMSE for each simulation.
I run the simulations for different sets of parameters by mapping the compute_simulation_TWFE function on each set of parameters. I enclose these parameters into a table, param_TWFE. Note that in this table each set of parameters appears n_iter times as we want to run the analysis \(n_{iter}\) times for each set of parameters.
I can now run the simulations by mapping the compute_simulation_TWFE function on param_TWFE.
I want to compute the summary variables of interest (average pre-treatment bias, post-treatment bias , RMSE). To make bias comparable across simulations, I normalize it by dividing it by the mean of the true effects. This normalization method is somehow arbitrary. To run these analyses, I only consider -5 to +5 lags around the treatment time.
simulations_TWFE <- readRDS(here("TWFE/Outputs/simulations_TWFE.RDS"))
summarise_simulations <- function(data) {
data %>%
filter(dplyr::between(lag, -5, 5)) %>%
mutate(mean_true_effect = mean(true_effect, na.rm = TRUE)) %>%
group_by(across(c(-estimate, -se, -p_value, -true_effect))) %>%
#group by basically all the DGP parameters
summarise(
bias = mean((estimate - true_effect)/mean_true_effect, na.rm = TRUE),
rmse = sqrt(mean(((estimate - true_effect)/mean_true_effect)^2)),
.groups = "drop"
) %>%
mutate(post = ifelse(lag >= 0, "post", "pre")) %>%
group_by(across(c(-lag, -bias, -rmse))) %>%
summarise(
bias = mean(bias),
rmse = mean(rmse),
.groups = "drop"
) %>%
pivot_wider(
names_from = post,
values_from = c(bias, rmse)
) %>%
select(-rmse_pre)
}
summary_simulations_TWFE <- summarise_simulations(simulations_TWFE)
# saveRDS(summary_simulations_TWFE, here("TWFE/Outputs/summary_simulations_TWFE.RDS"))
Then, we quickly graph these results to analyze them. Note that I only varied 2 parameters across simulations, het_indiv and het_time.
summary_simulations_TWFE %>%
select(het_indiv, het_time, bias_post, bias_pre, rmse_post) %>%
pivot_longer(cols = c(bias_post, bias_pre, rmse_post), names_to = "statistics") %>%
mutate(
statistics = str_to_title(str_replace(statistics, "_", " ")),
het_indiv = str_to_title(str_replace(het_indiv, "_", " ")),
het_indiv = paste("Indiv:", het_indiv),
het_time = str_to_title(str_replace(het_time, "_", " ")),
het_time = paste("Time:", het_time),
) %>%
# filter(statistics != "rmse_post") %>%
ggplot(aes(x = "", y = value, fill = statistics)) +
geom_col(position = "dodge") +
facet_grid(het_indiv ~ het_time) +
labs(
title = "Comparison of different statistics across different treatment types",
x = "",
y = "Mean value for each statistic",
fill = ""
)
One can notice that the TWFE issue discussed here is much more prevalent in the case of effects that are non constant in time and that increase in time. As discussed in the literature, in cases with non constant treatment effects in time and where treatment effect are larger for units treated first (an rather common setting), TWFE effects may yield highly incorrect values.